Quantitative methods 1:
Data collection

SKOC39: Introduction to research methods
and academic writing

nils.holmberg@isk.lu.se

Presentation

  • Nils Holmberg
  • Computational
    content analysis
  • Cognitive
    communication effects

Quantitative methods

    1. Experiments and
      Threats to Validity
    1. Survey Research,
      Questionnaire
    1. Quantitative
      Content Analysis

Course literature

Boyle and Schmierbach (2015)

Boyle and Schmierbach (2019)

Lectures and workshops

Data collection (nov 12)

    1. Concept Explication and Measurement
    1. Reliability and Validity
    1. Effective ­Measurement
    1. Sampling
    1. Content Analysis

Exam question 1

Data analysis (nov 26)

    1. Experiments and Threats to Validity
    1. Survey Research
    1. Descriptive Statistics
    1. Inferential Statistics
    1. Multivariate Statistics

Exam question 2

5. Concept Explication and Measurement

  • Conceptual Definition (p. 104)
  • Operational Definition (p. 107)
  • Dimension (p. 105)
  • Indicator (p. 108)
  • Nominal-Level Measure (p. 111)
  • Ordinal-Level Measure (p. 112)
  • Interval-Level Measure (p. 113)
  • Ratio-Level Measure (p. 114)

Conceptual Definition (p. 104)

5. Concept Explication and Measurement

  • Conceptual Definition:
    • A detailed, theoretical explanation of a concept, outlining what it means and its scope.
  • Operational Definition:
    • A specific, measurable way of defining a concept for use in empirical research.
  • Dimension:
    • For “social media engagement,” dimensions could include likes, comments, shares, and time
  • Indicator:
    • An observable and measurable variable that reflects a specific dimension of a concept.

Operational Definition (p. 107)

5. Concept Explication and Measurement

Dimension (p. 105)

5. Concept Explication and Measurement

Indicator (p. 108)

5. Concept Explication and Measurement

Nominal-Level Measure (p. 111)

5. Concept Explication and Measurement

Ordinal-Level Measure (p. 112)

5. Concept Explication and Measurement

Interval-Level Measure (p. 113)

5. Concept Explication and Measurement

Ratio-Level Measure (p. 114)

5. Concept Explication and Measurement

6. Reliability and Validity

  • Random Error (p. 125)
  • Systematic Error (p. 126)
  • Test-Retest Reliability (p. 131)
  • Construct Validity (p. 132)
  • External Validity/Generalizability (p. 139)

Random Error (p. 125)

6. Reliability and Validity

Systematic Error (p. 126)

6. Reliability and Validity

Test-Retest Reliability (p. 131)

6. Reliability and Validity

Content, Construct Validity (p. 132)

6. Reliability and Validity

External Validity/Generalizability (p. 139)

6. Reliability and Validity

7. Effective ­Measurement

  • Direct and Indirect Observation (p. 155)
  • Self-Report (p. 156)
  • Questionnaire Design (p. 162)
  • Closed-Ended (p. 170)
  • Exhaustive (p. 172)
  • Mutually Exclusive (p. 173)
  • Leading Questions (p. 174)
  • Social Desirability (p. 175)
  • Clarity (p. 176)
  • Double-Barreled (p. 177)
  • Likert-Type Items (p. 178)

Direct and Indirect Observation (p. 155)

7. Effective ­Measurement

  • Definition:
    • Direct observation involves the researcher watching and recording behaviors or phenomena as they occur.
    • Indirect observation relies on records, artifacts, or reports rather than firsthand observation.
  • Explanation:
    • Direct observation is often used in natural settings to minimize interference.
    • Indirect observation is useful when direct observation is not feasible (e.g., historical records).
  • Examples:
    • Direct: Observing traffic patterns at an intersection.
    • Indirect: Analyzing security footage to understand customer behaviors.

Self-Report (p. 156)

7. Effective ­Measurement

  • Definition:
    • A method where respondents provide data about themselves, often through surveys or interviews.
  • Explanation:
    • Common in psychological and sociological studies to understand attitudes, beliefs, or experiences.
    • Relies on the honesty and self-awareness of participants.
  • Examples:
    • Surveys about consumer satisfaction.
    • Diaries tracking daily physical activity.

Questionnaire Design (p. 162)

7. Effective ­Measurement

  • Definition:
    • A set of written digital questions used to collect data systematically.
  • Explanation:
    • Can include open-ended, closed-ended, or a mix of question types.
    • Suitable for large-scale data collection due to efficiency.
  • Examples:
    • Online polls measuring user satisfaction.
    • Employee feedback surveys conducted anonymously.

Closed-Ended (p. 170)

7. Effective ­Measurement

Exhaustive (p. 172)

7. Effective ­Measurement

  • Definition:
    • A characteristic of response options where all possible answers are covered.
  • Explanation:
    • Ensures that every respondent can find an appropriate option.
    • Reduces frustration and increases accuracy.
  • Examples:
    • Adding an “Other (please specify)” option in a list of job categories.
    • Including all demographic groups in a survey about income levels.

Mutually Exclusive (p. 173)

7. Effective ­Measurement

  • Definition:
    • A characteristic where response options do not overlap.
  • Explanation:
    • Prevents confusion and ensures respondents can select only one correct option.
  • Examples:
    • Age ranges like 18–24, 25–34 (not 18–25, 25–35).
    • Income brackets clearly separated by ranges without overlap.

Leading Questions (p. 174)

7. Effective ­Measurement

  • Definition:
    • Questions that subtly influence respondents to answer in a specific way.
  • Explanation:
    • Can bias results and reduce the validity of data.
    • Should be avoided in neutral research designs.
  • Examples:
    • “Don’t you think our product is amazing?”
    • “How much do you love our new feature?”

Social Desirability (p. 175)

7. Effective ­Measurement

  • Definition:
    • The tendency of respondents to answer in a way they believe is socially acceptable or favorable.
  • Explanation:
    • Can lead to overreporting of positive behaviors and underreporting of negative ones.
  • Examples:
    • Claiming higher frequency of exercise than actual.
    • Underreporting alcohol consumption in health surveys.

Clarity (p. 176)

7. Effective ­Measurement

  • Definition:
    • The quality of being easily understood, crucial for effective question design.
  • Explanation:
    • Avoids ambiguity and ensures respondents interpret questions as intended.
  • Examples:
    • Clear: “How many times do you exercise per week?”
    • Unclear: “Do you exercise regularly?”

Double-Barreled (p. 177)

7. Effective ­Measurement

  • Definition:
    • A question that asks about two or more issues at once, making it difficult to answer accurately.
  • Explanation:
    • Should be avoided as it confuses respondents and invalidates data.
  • Examples:
    • “Do you like our product and recommend it to others?”
    • Better: Separate into “Do you like our product?” and “Would you recommend it to others?”

Likert-Type Items (p. 178)

7. Effective ­Measurement

  • Definition:
    • Questions that ask respondents to rate their level of agreement, frequency, or intensity on a scale.
  • Explanation:
    • Typically uses a 5- or 7-point scale, enabling nuanced responses.
    • Useful for measuring attitudes or opinions.
  • Examples:
    • “I am satisfied with the service: Strongly Disagree – Strongly Agree.”
    • “How often do you use the product? Never – Very Often.”

8. Sampling

  • Representative Sample (p. 189)
  • Response Rate (p. 190)
  • Sample Size (p. 192)
  • Margin of Error (p. 200)
  • Simple Random Sampling (p. 211)
  • Convenience Sampling (p. 204)
  • Snowball Sampling (p. 207)
  • Stratified Sampling (p. 213)
  • Systematic Sampling (p. 214)

Population and Sample

8. Sampling

Representative Sample (p. 189)

8. Sampling

  • Definition:
    • A subset of a population that accurately reflects the characteristics of the whole population.
  • Explanation:
    • Ensures that findings from the sample can be generalized to the larger population.
    • Achieved through careful selection methods, avoiding bias.
  • Examples:
    • A representative sample for a national survey might include participants from various age groups, genders, and regions.
    • Political polls that proportionally include urban, suburban, and rural voters.

Response Rate (p. 190)

8. Sampling

  • Definition:
    • The percentage of people who complete a survey out of those who were invited to participate.
  • Explanation:
    • Higher response rates increase the reliability and representativeness of the data.
    • Low response rates may indicate nonresponse bias, where certain groups are underrepresented.
  • Examples:
    • A 70% response rate in a customer satisfaction survey is considered excellent.
    • Response rates for online surveys are often lower than for phone interviews.

Sample Size (p. 192)

8. Sampling

  • Definition:
    • The number of individuals or units included in a sample.
  • Explanation:
    • Larger sample sizes generally provide more accurate results and reduce the margin of error.
    • Determining the appropriate sample size depends on the study’s goals, population size, and desired precision.
  • Examples:
    • A study on consumer preferences with a sample size of 1,000 participants.
    • A clinical trial with a smaller sample size due to high costs or limited availability of patients.

Margin of Error (p. 200)

8. Sampling

  • Definition:
    • The range within which the true value for the population is likely to fall, given the sample data.
  • Explanation:
    • Smaller margins of error indicate greater precision in the results.
    • Dependent on sample size and variability in the population.
  • Examples:
    • A survey reporting 60% approval with a margin of error of ±3%.
    • Election polls with a margin of error of ±2%, indicating greater reliability.

Convenience Sampling (p. 204)

8. Sampling (Non-Probability)

  • Definition:
    • A sampling method where participants are selected based on their availability and accessibility.
  • Explanation:
    • Easy to implement but often lacks representativeness and introduces bias.
  • Examples:
    • Surveying students in a classroom because they are readily available.
    • Collecting data from customers visiting a store during a specific time frame.

Snowball Sampling (p. 207)

8. Sampling (Non-Probability)

  • Definition:
    • A sampling technique where existing participants recruit additional participants from their networks.
  • Explanation:
    • Useful for studying hard-to-reach populations but may lead to sampling bias.
  • Examples:
    • Researching social behaviors in underground communities by having initial participants invite others.
    • Studying experiences of rare disease patients through patient advocacy groups.

Simple Random Sampling (p. 211)

8. Sampling (Probability)

  • Definition:
    • A sampling method where every member of the population has an equal chance of being selected.
  • Explanation:
    • Ensures unbiased selection and representativeness but can be resource-intensive for large populations.
  • Examples:
    • Drawing names from a hat to select participants.
    • Using a random number generator to pick survey respondents.

Stratified Sampling (p. 213)

8. Sampling (Probability)

  • Definition:
    • A sampling method that divides the population into subgroups (strata) and selects samples from each subgroup.
  • Explanation:
    • Ensures representation of key subgroups, improving accuracy in estimating population parameters.
  • Examples:
    • Dividing a population by age groups and sampling proportionally from each group.
    • Ensuring representation of urban, suburban, and rural areas in a political poll.

Systematic Sampling (p. 214)

8. Sampling (Probability)

  • Definition:
    • A method where every nth individual or unit is selected from a list or population.
  • Explanation:
    • Simple to implement and ensures even distribution, but may introduce bias if the list has hidden patterns.
  • Examples:
    • Selecting every 10th customer from a queue for a feedback survey.
    • Picking every 5th household from a neighborhood list for a census.

11. Content Analysis

  • Intercoder Statistics (p. 294)
  • Challenge of Coding (p. 295)
  • Social Artifacts (p. 297)
  • Latent Content (p. 298)
  • Manifest Content (p. 299)

Intercoder Statistics (p. 294)

11. Content Analysis

  • Definition:
    • Metrics used to measure the agreement or reliability among coders when analyzing content. Common statistics include Cohen’s Kappa and Krippendorff’s Alpha.
  • Explanation:
    • Ensures that coding is consistent and not overly influenced by individual bias.
    • High intercoder reliability indicates that the coding scheme is well-defined and reproducible.
  • Examples:
    • Using Krippendorff’s Alpha to assess agreement on whether social media posts are categorized as positive, neutral, or negative.
    • Measuring intercoder reliability when coding news articles for instances of bias.

Challenge of Coding (p. 295)

11. Content Analysis

  • Definition:
    • Difficulties faced in the process of assigning codes to content, often due to ambiguity or subjectivity.
  • Explanation:
    • Challenges arise from unclear coding guidelines, complex content, or coder biases.
    • Effective training, clear definitions, and pilot testing can mitigate these challenges.
  • Examples:
    • Ambiguity in defining what constitutes “hate speech” in social media posts.
    • Coders interpreting “aggressive tone” differently when analyzing political debates.

Social Artifacts (p. 297)

11. Content Analysis

  • Definition:
    • Products of social interaction or behavior that can be analyzed in content analysis, such as texts, images, videos, or cultural symbols.
  • Explanation:
    • These artifacts serve as tangible evidence of social processes, values, or norms.
    • Studying them helps researchers understand underlying societal patterns.
  • Examples:
    • Analyzing advertisements as social artifacts to understand gender role representation.
    • Studying tweets as artifacts of public opinion on political issues.

Latent Content (p. 298)

11. Content Analysis

  • Definition:
    • The underlying meaning or implicit themes within a piece of content.
  • Explanation:
    • Involves subjective interpretation to identify hidden messages or connotations.
    • More nuanced and requires greater coder training compared to manifest content.
  • Examples:
    • Analyzing latent content in a political speech to identify ideological bias.
    • Identifying implicit cultural values in films or television shows.

Manifest Content (p. 299)

11. Content Analysis

  • Definition:
    • The explicit, observable content within a message, such as words, phrases, or images.
  • Explanation:
    • Easier to quantify and analyze as it requires minimal interpretation.
    • Often used in the initial stages of content analysis before delving into latent content.
  • Examples:
    • Counting the number of times the word “freedom” appears in political speeches.
    • Analyzing the frequency of product placements in a series of movies.

Next steps

Workshop 1, nov 19

Lecture 2, nov 26

References

Boyle, Michael, and Mike Schmierbach. 2015. Applied Communication Research Methods: Getting Started as a Researcher. Routledge.
———. 2019. Applied Communication Research Methods: Getting Started as a Researcher. Routledge.

Pretty Code

  • Over 20 syntax highlighting themes available
  • Default theme optimized for accessibility
import seaborn as sns

# Load the iris dataset
df = sns.load_dataset("iris")

# Display the first 5 rows
df.head()